CONTROLLED HIERARCHICAL FILTERING: MODEL OF 
NEOCORTICAL SENSORY PROCESSING 

ANDRAS LORINCZ 



Abstract. A model of sensory information processing is presented. The model assumes 
that learning of internal (hidden) generative models, which can predict the future and 
evaluate the precision of that prediction, is of central importance for information ex- 
traction. Furthermore, the model makes a bridge to goal-oriented systems and builds 
upon the structural similarity between the architecture of a robust controller and that of 
the hippocampal entorhinal loop. This generative control architecture is mapped to the 
neocortex and to the hippocampal entorhinal loop. Implicit memory phenomena; prim- 
ing and prototype learning are emerging features of the model. Mathematical theorems 
ensure stability and attractive learning properties of the architecture. Connections to 
reinforcement learning are also established: both the control network, and the network 
with a hidden model converge to (near) optimal policy under suitable conditions. Fal- 
sifying predictions, including the role of the feedback connections between neocortical 
areas are made. 
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1. Introduction 

Our thinking is best expressed by the words of Albert Szent-Gyorgyi |125j . the 
famous Hungarian Nobel Laureate: 'There is no real difference between structure and 
function; they are the two sides of the same coin. If structure does not tell us anything 
about function, it means we have not looked at it correctly.' 

Here, a general framework is searched for that explains information processing 
function of the brain, extends to the goal-oriented nature of information processing and 
explains the structure given these functions. For the sake of clarity, first our assumptions 
about the function shall be stated. Built on the assumptions, we shall work on 'deriving' 
the building blocks of structure. Finally, these blocks will be mapped to the substrate. 

1.1. Starting assumptions. The model is built on a few interlinked 'axioms': 

(1) The brain interacts with the environment, signals are received and responses are 
generated. 

(2) Signal detection corresponds to filtering of all possible detectable phenomena. 

(3) Response generation influences the environment, influence on the environment is 
goal oriented and aims to control environmental parameters. 

(4) Control of the environment is subject to optimization. 

1.2. 'Guiding principles'. Our starting assumptions are constrained as follows: 

The homunculus fallacy should be solved. Our thoughts are grounded on the hypothesis 
that representations do exist in the brain (see e.g. the debates about the Representational 
Theory of Mind and its modern extension, the Computational Theory of Mind |251 117j . 
but also JH). The use of representations can hardly be avoided in any computational 
modelling. Generally speaking, the processing of signals that may convey information can 
be considered as a transformation into another form that still carries the whole amount 
or just a piece of the original information. The environment feeds the system with some 
inputs and the system output represents (a part of) the environment. Whilst most models 
address the problem of coding inputs and making efficient internal representation, we are 
more concerned about the fundamental problem of making sense of these representations. 
In our view, the central issue of making sense or meaning is to provide answers to questions 
like 'what does it mean?' in terms of our past experiences, or 'how are they related?' in 
terms of known facts. In other words, making sense is inherently related to declarative 
memory. As a consequence, the homunculus fallacy (see e.g., jll6j ) — that the internal 
representation is meaningless without an interpreter — is of central importance. This 
fallacy claims that all levels of abstraction require at least one further level to become the 
corresponding interpreter. Unfortunately, the interpretation — according to the fallacy 
— is just a new transformation and we are trapped in an endless regression.^ 



^We note there can be more than one route to resolve the fallacy (see, e.g., 50 ). Along the line of the 
classical black box modeling the fallacy does not arise at all, but meaningful labeling of blocks of the model 
can be questioned. 
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1.2.1. Constraint of reconstruction. Our standpoint is that the paradox stems from 
vaguely described procedure of 'making sense'. The fallacy arises by saying that the inter- 
nal representation should make sense. One can turn the fahacy upside down by changing 
the roles j73j : Not the internal representation but the input should make sense. Our pro- 
posal is that the input makes sense if the same (or similar) inputs have been experienced 
before and if the input can be derived or regenerated by means of the internal representa- 
tion j731l81j . All in all, the goal is to turn the infinite regression into a reconstructing loop 
structure and shortcut the fallacy. According to this approach the internal representation 
interprets the input by (re) constructing it. This function is more than mirroring the envi- 
ronment. Interpretation based reconstruction can fill in missing parts of spatio-temporal 
patterns^, which includes the capacity of prediction. 

1.2.2. Constraints of control and optimization. Interpretation is goal oriented and forms 
a delicate 

'perception-action loop': The actual goal requires (1) the sensing of environmental 
parameters, (2) the ability to influence, i.e, to control those parameters, (3) the sensing 
of consequences of control and so on. In turn, perceptual information is transformed, 
transformation is controlled and the control of transformation is subject to the actual 
goal. Because transformation depends on the actual goal, control typically acts on a 
partially observed environment; information is filtered. 

Sensing and control of the environment consumes energy, which should be min- 
imized. Minimization of energy consumption is a long-term task: short term saving at 
the cost of large long-term spending needs to be avoided and long-term cumulated cost 
is to be minimized. Such optimization problems are formulated within the framework of 
reinforcement learning (RL)^. It is then necessary to consider control concepts subject to 
principles developed in RL in partially observed environments. This challenging IBl] and 
generally computationally intractable problem ^1 should be addressed by the model. 

1.2.3. Architectural constraints on the neural level. There are constraints on the building 
blocks of the filtering and controlling system, such as 

(1) Locality: The architecture is made of connected simple computational units. Con- 
nected units are neighbors. Computations of any unit are based on information 
received from its neighbors. 

(2) Connections are directed and serve as filters. Connections possess tunable filtering 
strength or weights. Tuning of connections is subject to Hebbian-learning: signals 
of the two computational units at the two ends of a directed connection determine 
the adaptation of the weight of the connection. 

(3) Locality and Hebbian-learning concern all functions, including sensing, control and 
optimization. 



^Here, 'spatial' means information sets processed almost simultaneously. For example, almost simulta- 
neous retinotopic information or, information about different audio frequencies in the auditory cortex, or 
both of these, etc., are called spatial components. 

■'For an excellent introductory materials on RL, see, e.g., |123| . 
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1.2.4. Anatomical constraints. The architecture should match known architectural prop- 
erties of sensory processing areas of the neocortex up to the top, the hippocampus (HC) 
and its surrounding, the hippocampal formation. 

1.3. Origins of the model. Ever since the discovery of the central role of the hippocam- 
pus and its adjacent areas in memory formation jll7| I88j. numerous studies and models 
dealt with the properties and the possible functions of the hippocampus and its environ- 
ment. The number of new experimental findings is increasing and highlight the complexity 
of the behavior of memory. Although views are strikingly different, they seem to have their 
own, experimentally supported merits. The interested reader is referred to the literature 
for excellent reviews on the hippocampus written, e.g., by Squire jl21j . Hasselmo and 
McClelland gll, Redish [M] and O'Reilly and Rudy The majority of the models 

have been developed to describe one part (mainly the CAS field) of the hippocampus 
(see, e.g. [Sniinn])- Attempts have been made to develop an integrating model of the HC 
[inHl ESI IZOl HIES- See also works collected by Cluck It is known though, that hip- 
pocampus is deeply embedded in the neocortical information flow through the entorhinal 
cortex (EC). 

This fact explains the emergence of a few EC-HC models like jSHl lEHl HSl HU^l 
ISniinni- For example, McClelland et al. [Sni emphasize the necessity of a dual system 
for the seemingly contradictory tasks of learning of specific properties and allowing for 
generalization. 

The controlled hierarchical filtering (CHF) model that we present here, has its 
origin in the old standing proposal that the hippocampus and/or its environment serve 
as a 'comparator' [H ^ I12()| ll.'fflj . More recent works about this subject try to provide a 
neuro-psychological account of anxiety and consciousness |3SllSlllSnilSSl- Other models use 
somewhat different nomenclature, e.g., the focus is placed on match/mismatch detection 
[1Q2, .93.,.37). Match/mismatch detection is closely related to familarity /novelty detection, 
another direction of theoretical efforts to describe medial temporal lobe areas j98| 
I141j . The form of novelty is probably polymorphous and there is increasing evidence that 
different brain areas share the task of recognizing different aspects of novelty within the 
same scene |14nj . It seems that the encoding of novelty is distributed, which is a crucial 
point of the CHF model. 

The CHF architecture is an extension of our previous works |751 176| 180] . It may be 
worth noting that two falsifying predictions of that model, (a) large and tunable temporal 
delaying capabilities of neurons of the dentate gyrus and (b) persistent activities at deep 
layers of the entorhinal cortex have been reinforced recently by Henze et al., jUj and by 
Egorov et al. [2Sj, respectively. 

The paper is constructed as follows. First (in Section [2), terminology is provided 
and basic concepts are defined. The CHF model is detailed here. Section |51 deals with the 
mapping of the control and 

reconstruction architectures to the entorhinal-hippocampal loop and to neocortical 
areas. Section |3 discusses relations to other computational models, e.g., 

123 11221 11U41 11U51 in] . This section treats learning and stability properties of the 
architecture, connections between RL and the model architecture, connections to RL and 
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to partially observed RL problems, the emerging neurobiological features, missing links 
and some conjectures of the model. The paper is finished by an Appendix containing some 
of the mathematical details. Other mathematical details can be found in the cited refer- 
ences, including the technicalities, which are made available as technical reports through 



Notations of the control field and notations of neuron networks differ. In control 
theory, input-output systems are considered. In graphical form, box denotes the system 
and arrows denote the system's input(s) and output(s). Processing occurs in the box 
(Fig. Hi^A)). On the contrary, artificial neural networks consist of computational units, 
the putative analogs of real neurons. The units, also called neurons, receive (provide) 
inputs (outputs) through the connection structure, and this internal functioning is drawn 
explicitly. Neurons execute simple computations, like summing up inputs, thresholding 
and alike. The main part of the neural network performs distributed computation using 
the connection structure performing (non-linear) filtering. This distributed filter system, 
which may connect all neurons, is called the connection system, weights, or synapses. 
In a neural network architecture different neural layers are distinguished. Connections 
between these layers are explicitly drawn in most cases (Fig. EJB)). Computations of 
neural networks between their inputs and outputs can be given in the following condensed 
form: 



where input and output are denoted by x G M" and y € M™, respectively, linear trans- 
formation from M" to M™" is represented by matrix W € M*"^", the connections, and 
function / denotes components- wise non-linearity. If this function is the identity function, 
then we have a linear network. Here, a simplified notation will be used: neural layers 
will be denoted by horizontal thick lines. Any particular set of connections between two 
layers will be represented by a single arrow. A feedforward linear network is depicted in 
Fig. n^C). The graphical form of a network with component- wise non-linearity is shown in 
Fig. ^D). Different transformations may exist between two layers. Recurrent connections 
(also called 'recurrent collaterals') target the same layer where they originate from. 

2.1. The control model. Our control problem is formulated in terms of state dependent 
directions pointing towards target positions. A mapping which renders direction (change 
of state, or change of state per unit time, i.e., velocity) to each state is called speed- 
field. A particular speed-field is given, for example, by the difference vectors between the 
target state and all other states. An important feature of speed-field is that motion is not 
specified in time. The control task is defined as moving according to the speed-field at 
each state. This control task is called speed- field tracking (SFT). For a review on SFT, 
see, e.g., (35] ■ SFT formulation is flexible because different fields can be designed for the 
same task and, also, it allows motions to speed up or to slow down simply by scaling of 
the speed-field. Speed-field can be seen as a local tool for path planning (see, e.g., [IHI and 
references therein). The control task of path (also called trajectory) tracking is, however. 



)http: / / arxiv.orgi 
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Figure 1. Notations 

A: Control representation of input output systems. Computations are per- 
formed in the box. 

B: Neural network representation of computations: inputs are received by 
input neurons and are (non-linearly) transformed by connections and the 
output neurons, which provide the outputs. An output neuron could be an 
input neuron of the next processing stage. 

C: Linear neural transformation. Input: x, transformation W, output: y, 
y = Wx. 

D: Non-linear neural transformation, y = /(Wx). Graphical form: ar- 
row with a circle. More than one transformation may exist between layers. 
Recurrent network is a neural layer with a transformation that targets the 
same layer. 

Terminology in the context of neurobiology: 

Layer corresponds to a given area sometimes called field or subfield, such 
as the CAS and CAl regions of the hippocampus, or the different layers of 
the neocortex. Transformations may correspond to (i) excitatory synapses 
connecting layers or targeting neurons of the same layer, such as the recur- 
rent collaterals and the associative connections of the CAS subfield of the 
hippocampus and the intra-layer excitatory connections of layers II and III 
of the neocortex or (ii) inhibitory synapses between layers or within layers, 
such as the rich interneural networks in the hippocampus. 



s) output(s) output(s) output(s) 




s) input(s) input(s) input(s) 



different. The difference between SFT and ordinary trajectory tracking is shown in Fig. [21 
One might say that SFT is less stringent, less precise and puts more emphasis on the 
global goal than on the local perturbations. 

The dynamic equation of a system is a (possibly continuous) set of differential 
equations. This set of equations determines the change of state per unit time given the 
external forces acting upon the system, including the control action. Inverse dynamics 
works in the opposite way: given the state and (desired) change of state, inverse dynamics 
provides the control vector. If the inverse dynamics is perfect then inserting the control 
vector into the dynamic equation the desired change of state is achieved. The controller, 
in turn, maps state and speed to control action. 
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Figure 2. Trajectory tracking (TT) versus speed field tracking 
(SFT) 

TT: Horizontal lines represent different nearby trajectories. Black line: 
initial trajectory. Upon perturbations the system (the 'plant') should re- 
turn to the original trajectory. 

SFT: Horizontal lines represent a small part of the speed-field to be 
tracked. Black line: initial speed trajectory. Upon perturbation, the plant 
adjusts its speed to the speed of the actual neighborhood. 



Inverse dynamics, however, changes if kinematic parameters (such as dimensions; 
length, width, etc.), or parameters of the dynamics (e.g., weight, flexibility and so on) 
change: Inverse dynamics is almost never perfect. Moreover, it is well known, that ap- 
proximate inverse dynamics can give rise to instabilities [HII. In turn, a robust extension 
of speed-field tracking is needed. Such robust control architecture is described here. 

Let X and x, where 'dot' denotes temporal derivation, represent the state and the 
change of the state per unit time (the momentum) of the plant, respectively. Let 

(2) x = f(x,u) 

denote the dynamics of the plant, i.e., in state x G M" and under control u € M^, the 
momentum of the plant becomes x € described by the nonlinear function f : M"'^^' — >• 
R". Let v(x) G M" denote the desired change of state (the desired momentum). Assume 
that we have an approximate feedforward model of the inverse dynamics: 

(3) u// = u//(x,x,v(x)) 

which is an input-output system receiving inputs (the state, the momentum, the desired 
momentum) and providing output (the control vector ujj : — > I^P). If this control 
vector is used directly to influence the plant then it is called feedforward controller. The 
perfect feedforward control vector u^j makes the plant to produce momentum v(x): 

(4) v(x)=f(x,u}/x,x,v(x))) 

If the feedforward control vector is imprecise then error (a difference between the desired 
momentum and the experienced momentum) ec = v(x) — x appears. To correct this error, 
the (same or another) model of the inverse dynamics can be used. This error correcting 
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controller is called feedback controller'^: its inputs are the state, the momentum and the 
desired compensation, i.e., ec- The output of the feedback controller is subject to temporal 
integration. The output is called the feedback control vector uj^. The time integrated 
and amplified output of this controller is used to correct the feedforward control vector: 

(5) w = Au/fe 

(6) u = Uff + w 

i.e., u = Uff + A J Ufbdt where A denotes the (amplifying) gain factor. Feedback vector 
Ufi, disappears when Uff = Uj^. Assume an approximate inverse dynamics of the following 
form: 

(7) u = *(x,x). 

A particular form of the feedback control vector is simply a comparator that disappears 
provided that ujj(x, x, v(x)) is perfect: 

(8) Ufb = *(x, v(x)) - ^(x, x) 
This scheme is depicted in Fig. |21 

2.2. From control architecture to reconstruction networks. Our control architec- 
ture can be related to a reconstruction network. To see this, first a particular (state 
dependent but linear) form of the inverse dynamics is assumed |128j : 

(9) ^(x,x) = A(x)x + b(x). 

It has been shown that one can use the feedback controller in 'feedforward position' without 
effecting stability properties |128j : 

(10) uyj = *(x,v(x))-*(x,x) 

In this case the feedforward controller will never be perfect. Explicit modelling of b(x), 
is unnecessary given that this quantity falls out in this comparation-based controller 

(11) u/b(x, X, v(x)) = u/b(x, X, v(x)) = A(x) (v(x) - x) . 

This simplified architecture, which is built of comparators is depicted in Fig. HI We note 
that (i) the control scheme is capable of controlling plants of any order (Appendix 15.2)1 
and (ii) it has attractive global stability properties |128j . 

Figure m can be further simplified (i) by assuming a first order plant, i.e., a plant 
with dynamical equation x = f(u) being independent of the actual state and (ii) by 
neglecting the feedforward controller.^ Now, the experienced variable is x, whereas the 
desired variable is the desired state denoted by x. The corresponding architecture is shown 
in Fig. El^A). Figure E1(B) depicts a loop made of neural network layers with the same 

^It is to be noted that there is a reasonable freedom in the functional form of the feedback and feedfor- 
ward controllers. |128| . 

''Restriction to first order plants can be released by the change of notations (ADPendix l5.1|l . The control 
architecture can work without the feedforward controller, but - according to computational experiments 
- noise sensitivity increases considerably |128| . 
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Figure 3. Robust controller for speed- field tracking tasks 

A: The model of the inverse dynamics is inputted by the actual state x, 
the momentum x = ^ , and the desired momentum at the actual state 
v(x). The output of the model is the feedforward control vector = 
Uff{x, X, v(x)). The feedforward control vector may need corrections. The 
feedback control vector, which is inputted by the state, the momentum and 
the desired compensation Cc = v(x) — x serves this purpose. The output 
of the feedback controller is integrated by time, it is multiplied by the gain 
factor A and the result is added to the feedforward control vector to form 
the approximate control vector u. 

B: The feedback controller is composed of two simplified models of the 
inverse dynamics. Their effect cancels and, in turn, feedback control action 
disappears when the feedforward controller is perfect, i.e., when control 
vector Uff produces the desired momentum: x = v(x). These models 
have two arguments, the state and the momentum. The first model is 
inputted by the actual state and the desired momentum. The output of 
the model makes a positive contribution. The second model uses the actual 
state and the actual momentum. The difference of the two outputs, = 
<&(x, v(x)) — <&(x, x), is the feedback control vector. 



dynamical properties f Appendix 15. 3|1 . From now on, let W and Q denote the 'bottom- 
up' (BU) and the top-down (TD )connections of the reconstruction network, respectively. 
There is, however, a subtle difference between the two architectures: The control network 
starts from a planned desired state x and acts upon the plant to experience that state. 
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Figure 4. Robust speed- field tracking controller using differences 
between desired and experienced quantities 

Feedforward and feedback controllers are the same, the output of the latter 
is integrated over time, amplified and is added to the output of the former 
to control the plant. Both controllers make use of the simplified inverse 
dynamics of Fig. OfB) and both controller is inputted by the desired com- 
pensation Be = v(x) — X. The scheme produces globally stable control 
under suitable conditions (Appendix 15. 



The reconstruction network experiences the state (i.e., the input) x and acts upon a 
hidden layer to produce an the internal representation h of the network, which generates 
a reconstructed input y that should match the experienced input. The reconstruction 
network is an auto-associator, equipped with a hidden layer j4"5] ilHj . The reconstruction 
network is also a comparator that minimizes the reconstruction error e = x— x. It is worth 
noting that the sign of the difference is the opposite as it is in the control architecture. The 
reason is that reconstruction network follows the environment, whereas control architecture 
manipulates it. The reconstruction network 

(1) generates the reconstructed input x via the 'top-down' (TD) transformation, which 
is inputted by the hidden internal representation h, 

(2) compares the input with the reconstructed input and produces the reconstruction 
error e = x — x, 

(3) processes the reconstruction error via the 'bottom-up' (BU) transformation and 
corrects the internal representation by that. (In continuous time, adding up 'cor- 
rections' is equivalent to the temporal integration of the error.) 

Under certain conditions - matrix WQ of Fig. EtB) should be positive definite (see Ap- 
pendix EIHI) ^ error compensation converges and the network relaxes. 



(12) 



h = W(x - Qh) 
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(A) (B) 

Figure 5. Control architecture for first order plant and the equiv- 
alent reconstruction network 

A: Control architecture for first order plants and without feedforward con- 
troller. 

For first order plant, the desired quantitiy is the desired state, whereas the 
experienced quantity is the experienced state. For the sake of comparison 
with subfigure B, desired state and experienced state are denoted by x and 
X, respectively. 

B: The corresponding reconstruction network. 

Reconstruction network with the same dynamical properties. Input (vec- 
tor) x is provided to the network. Input is compared to 'reconstructed 
input' (vector) x, which is generated by internal representation (vector) h 
through the 'top-down' (TD) transformation (matrix) Q. Mismatch (vec- 
tor) X — X is delivered to correct the internal representation via 'bottom- up' 
(BU) transformation (matrix) W. Correction is achieved by temporal in- 
tegration (i.e., adding up the correcting term and applying recurrent self- 
excitations) at the level of the internal representation. Note the switch 
between experienced (sensed) and desired (to be matched) quantities. 



2.3. Extended reconstruction network. The reconstruction network of Fig. E^B) can 
be extended to fulfill particular constraints and computational tasks. The extended net- 
work is depicted in Fig. El The working of the network can be understood as follows: 

Sensory input vector x is compared to the reconstructed input vector x. The error 
e = x — X is transformed by the bottom-up (BU) transformation matrix W and forms the 
BU transformed error s. BU transformation maximizes BU information transfer in order 
to facilitate reconstruction. BU transformed error is passed to the internal representation 
layer through transformation matrix N (the role of this transformation shall be discussed 
later) and is added to the internal representation h of the 'hidden' (or model) layer. The 
activity of the hidden layer is maintained by diagonal elements of recurrent matrix M. 
Considering the BU error correction, matrix M can serve temporal integration. 

Beyond this temporal integration, off-diagonal elements of associative matrix M 
can perform temporal prediction |1U41 1161 llUUj . 
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Figure 6. Extended and controlled reconstruction network archi- 
tecture 

Arrow with circle denote non-linear transformation, x and x: input and 
reconstructed input, W and Q: bottom-up (BU) and top-down (TD) trans- 
formation, s: BU processed reconstruction error with maximized informa- 
tion, h: activity vector of the hidden (model) layer, N: transformation from 
BU processed reconstruction error layer to the model layer, P = QN^^: 
BU matrix of the inner loop, /(Px) BU gate releasing vector, with spar- 
sifying non-linearity /(.), B and v: tools of control. The reconstruction 
network can be considered as the environment of the control network, which 
provides the control vector Bv. v is the desired momentum. Black: Re- 
construction network architecture. Black and gray together: controlled 
reconstruction network. 

Reconstruction vector x is generated by TD matrix Q. Note that any column of 
matrix Q could be equal to (different) individual inputs. In this case, reconstructed input 
will be the optimal linear combination of the individual inputs. However, the learning 
principles of this TD matrix can be more sophisticated than such a fast imprinting-like 
encoding. 

Our specific assumption on TD matrix Q is constrained by our proposal on the 
resolution of the homunculus fallacy: the function of the hidden layer should be spatio- 
temporal pattern completion. Spatial pattern completion using prewired or experienced 
correlations can complete missing pieces of information. Representations using single (i.e., 
positive) sign empower the learning of correlations. Algorithms capable of finding positive 
components are called positive (non-negative) matrix factorization algorithms j99 | l65 |[l45j . 
Positive matrix factorization together with pattern completion algorithms may imply jl24j 
recognition by components jSj. That is, TD matrix Q is assumed to accomplish positive 
matrix factorization. Matrix Q, which plays a major role in determining the relaxed 
hidden activity, is considered the long-term memory of the network. 

As inner loop is introduced in Fig. [S] to perform non-linear noise filtering or spar- 
sification of the BU processed error. The sparsification matrix P = (QN)~^ transforms 
reconstruction vector x. Components of the output of this transformation form the gate 
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opening vector. If a component of this gate opening vector is below a certain threshold 
then the corresponding component of the BU processed error is diminished. There are 
theoretical works underpinning the idea: It has been shown that wavelet denoising |85j 
can be generalized to different databases using independent component analysis (ICA) 
[551 IBH [71 HI EH For a recent review on ICA, see i51j. ICA maximizes infor- 
mation transfer under the assumption that there are hidden factors and the probability 
distribution of these factors is equal to the product of the probability distribution of the 
individual components. Estimations based on ICA components are local: each component 
can be estimated separately. 

Thresholding of independent components, alike to thresholding of wavelet compo- 
nents decreases the structureless noise content of the reconstructed input [SH EOl • The 
method is called sparse code shrinkage (SCS). Relation between SCS and an overcom- 
plete reconstruction network with sparsifying non-linearity j95j has also been established 



Theoretical considerations |76l l8Uj indicate that (i) matrices P, Q, N, W can 
be tuned by Hebbian means, (ii) The reconstruction process diminishes some of the 
components of the input by projecting into to the subspace determined by the columns 
of matrix Q. (iii) Matrix P performs ICA on the reconstructed input and thus denoising 
concerns the subspace of matrix Q. (iii) Matrix W performs ICA on the input and the two 
ICA transformations may differ, (iv) Upon tuning, matrix QNW becomes the identity 
matrix and (v) matrix P becomes equal to matrix W and both perform the same ICA 
transformation, (vi) The speed of tuning for matrices P, Q and W should be such that 
tuning of TD matrix Q is the slowest and tuning of BU matrix W is the fastest. We shall 
return to these points later. In turn, this network does the following: (i) learns to predict 
(near) future, (ii) maximizes BU information transfer, and (iii) filters noise. 

2.4. Working of the extended architecture under control. 

2.4.1. Working of the extended architecture. The extended and controlled architecture 
satisfies the following non-linear equations: 



where BU error e = x — x is the mismatch between input and reconstructed input. This er- 
ror vector undergoes BU transformation (matrix W) and forms the BU error vector. Com- 
ponents of the BU error vector (Eq. 113b ) are subject to sparsification (function /px(-)) 
where matrix P is determined by matrix Q. Matrix M fEq. I13C ) is responsible for tem- 
poral integration, for prediction and for component based completion of spatio-temporal 
patterns jHU] . 

2.4.2. Control of the extended architecture. The reconstruction network of Fig. El can be 
controlled by acting on the internal representation. This is the fourth term (D) of the 
r.h.s. of Eq. 1131 Control adds extra contribution, i.e., Bv to the hidden layer. Vector 
V - which can be a function of the internal representation h (v = v(h)) - is the desired 
momentum. From the point of view of the controller, either /(W(x — x)) or its linear 
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transform N/(W(x — x)) can serve as the experienced momentum. Clearly, there is no 
warranty that the experienced momentum will match the desired one, unless matrix B is 
properly tuned. For a learning system, this match can not be warranted and thus a robust 
control scheme that can enforce the desired quantities becomes a necessity. 

2.5. Reconstruction network controlled by a robust controller. The control 
scheme of Section 12.11 has several advantages (see Appendix 15. 1|) : 

(1) As a result of the robustness, the learning of controlling is simplified: if the control 
action is sign proper, i.e., the control action takes the system into the good direc- 
tion, then control is ultimately uniformly bounded and globally stable. Moreover, 
the bound of the tracking-error can be made arbitrarily small. 

(2) 'Learning-by-doing' can be accomplished during controlling, no matter if sign 
properness is fulfilled or not: the actual state and momentum are to be associated 
with the actual control vector. 

Reconstruction network extended by the robust controller is depicted in Fig. I3A) The 
working of the architecture on Fig. [3 can be understood as follows: First, let us consider 
the lower reconstruction network. This is the 'plant' (i.e., the 'environment') to be con- 
trolled. The controller is the upper reconstruction network. The lower network passes its 
experienced BU error (x) to the upper network through matrix U. The effect of matrix U 
may be modulated by vector Vh. The BU input to the upper network is Ux, which - up to 
a linear transformation - is equal to h. The desired momentum is provided by a particular 
transformation (not shown explicitly in any of the figures, but which is present in the neo- 
cortical structure |12| I13j ) that originates from the internal representation and targets the 
corresponding reconstruction error layer of the same reconstruction network. We assume 
that this top-down signal is subtracted from the experienced momentum. The difference of 
the desired and experienced quantities forms the input to the feedforward controller. The 
output of the feedforward controller is ujf = B(h)(v(h) — h). Note that matrix B con- 
tains both BU and TD transformations. According to the working of the reconstruction 
network, the input, i.e., v(h) — h, undergoes SCS noise filtering and temporal integration 
in the upper reconstruction network to make the reconstructed input. Apart from a linear 
transformation, this reconstructed input is the input to the feedback controller (Eqs.EUHl 
and Fig. HJ. The output of the feedback controller is ujf, = J A(h)(v(h) — h.)dt, where 
SCS noise filtering is not shown explicitly. 

All components of the robust controller are now given and proper operation can be 
achieved, provided that transformations are sign-proper. In turn, the first and possibly the 
most problematic step of the learning task is the shattering of the state space to domains, 
within which the sign of control components does not change. Whereas the finding of 
the sign-proper domains may be a hard task, it is worth noting that there is no other 
condition imposed on the BU and TD matrices. For example, these transformations can 
be modulated vigorously, provided that sign-properness is kept (Appendix 15.1(1 . 

2.6. Working of the perfectly tuned hierarchy. Assume that all reconstruction net- 
works are perfect: At all levels, the products QNW of Fig. El are equal to I, the identity 
transformation. Assume that prediction is perfect, too. In this case, BU processing is 
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(A) (B) (C) 

Figure 7. Robust control of the extended architecture 

A: The two architectures, i.e., the reconstruction network and the robust 
control architecture are merged. Black color: components of the robust 
control architecture. Top and bottom: reconstruction networks. Matrix 
U carries x, the experienced speed. Also, Ch = Ux. Vector Vh may 
modulate the effect of matrix U and it can introduce state dependence. 
Matrices Ai and Bi are also components of the differencing controllers. 
Uff = B(h)(v(h) — h). and Ufb = J A(h)(v(h) — h)dt where matrices 
with hats comprise the effects of BU and TD transformations. (See text 
and Appendix 15. 11 ) 

B: The top of the hierarchy. Dashed gray arrows and dashed gray levels do 
not belong to the architecture at the top. The top plays double role: (l)It 
is a reconstruction network. (2) It is a robust controller, because mismatch 
influences the activities of a hidden layer at a lower level. 
C: The top of the hierarchy together with a lower robust controller. Ro- 
man letters represent corresponding sublayers of areas of the neocortical 
hierarchy. Dark gray areas: robust control, light gray boxes: neocortical 
layers, HC: hippocampus. 



error free, no error appears, no error correction occurs and, in turn, BU processing is as 
fast as in feedforward networks. 
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(A) (B) 

Figure 8. Bottom-up filtering (A) and top-down control (B) are 
both feedforward for perfectly tuned networks 

Black arrows: the flow of information. White arrows: approximately silent 
connections. 

Similarly, for properly tuned inverse dynamics, top-down control will not produce 
error and top-down processing works also as a feedforward control architecture. These 
features are depicted for BU and TD processing in Fig. IHl^A) and (B), respectively. 

2.7. Closing the loops of the hierarchy. We need to design the top of the sensory 
processing hierarchy. The top level receives information from different sensory systems, 
or modalities. Pattern completion may reveal that some of the components are missing 
or have been corrupted by noise and need to be fixed. To have a coherent interpreta- 
tion, the top may need to correct such errors by influencing, i.e., controlling the internal 
representations of lower layers, the 'environment' of the top. In turn, we need a twist at 
the top: Mismatch between input and reconstructed input at the top should be also the 
mismatch between desired and experienced components. The two roles: reconstruction 
and robust control should be merged at the top. The solution is that the mismatch of the 
top reconstruction network is also the mismatch of the inverse dynamics effecting the hid- 
den activity of a lower reconstruction network. This twist is shown in Fig. [JfB). Clearly, 
this is a reconstruction network with hidden layer h displaced. Figure EfC) depicts the 
twisted top together with a lower control architecture. The hidden representation of the 
top reconstruction network has the double role form the point of view of its neighboring 
reconstruction error layers. 

This trick - that is the double role architecture at the top of the hierarchy - turns 
bottom-up processing to top-down control. However, there are additional constraints 
posed by information maximization: As long as lower networks are not properly tuned, 
processing is not feedforward and error correction gives rise to temporally convolved signals 
( Appendix 15. 3|) . Temporal convolution corrupts the maximization of information transfer. 
In turn, temporal blind source deconvolution (BSD) [71 11371 [1361 166j . is necessary. 
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Figure 9. The top of the hierarchy and its mapping. 

Maximization of BU information transfer is made in two steps: (i) whiten- 
ing and (ii) ICA (separation). Blind source deconvolution sub-network has 
recurrent networks with long delays and removes temporal convolutions. 
Corresponding areas/layers in the loop formed by the hippocampus and 
the entorhinal cortex (EC): (i) reconstruction error (vector e = x — x): 
layer II of EC, (ii) reconstructed input (vector x): layer III of EC, (iii) hid- 
den or model layer (vector h): layers V and VI of EC, (iv) recurrent loop of 
hidden layer (matrix M): associative structure of EC layers V and VI, (v) 
BU processing layer in computational order (whitened reconstruction error 
w and separated reconstruction error s): area CA3 and area CAl of the 
hippocampus, respectively, (vi) BU matrices Wi and W2: EC afferents of 
the CAS layer and Schaffer collaterals, respectively, (vii) recurrent loops 
with r delays and vector d: internal circuitry of the dentate gyrus and 
mossy cells of the dentate gyrus, respectively, (viii) matrix W3: perforant 
path afferents of the dentate gyrus, (ix) matrix W4: EC afferents of area 
CAl of the hippocampus, (x) matrix Q: deep layer to superficial layer 
connections of the EC, the long term memory components of the loop (xi) 
matrix N: CAl afferents of EC deep layers, the model layer of the loop. 

As it has been emphasized by Lorincz and Buzsaki BSD in general, is very 
demanding. The number of neurons and the number of connections required by BSD 
are enormous. Fortunately, temporal convolution induced by reconstruction networks has 
special properties and, in principle, temporal deconvolution can be executed by a relatively 
small - but still 'expensive' - structure (see App endix l5 . 31 for details). This structure needs 
recurrent connections with long and tunable delays. Given the robust control properties of 
the hierarchy, approximately correct signals can be enforced by starting from the top j75j . 
Thus, this expensive BSD structure is necessary only at the top of the hierarchy. BSD 
at the top deconvolves the reconstruction error before it is turned into a control signal. 
The reconstruction network together with BSD structure made of tunable delay lines is 
depicted in Fig. El 

There is another difference between Figs. and |H1 BU processing is executed in 
Fig. I^in two layers. The two layers perform two-step learning to maximize information 
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transfer. The transformation to the first layer whitens (i.e., it decorrelates and normalizes 
components of the reconstruction error ^64 ), whereas the transformation to the second 
layer removes higher order correlations and, in turn, it develops independent components. 
The two step learning algorithm is fast, because it proceeds along the so called natural 
gradient (see, e.g., IT| and references therein). ICA can be slower in the BU sparsification 
transformation if it follows the one step learning rule of Bell and Sejnowski [7j. 

(14) AP oc /(s)x^ + 

where /(.) denotes component-wise non-linearity. This learning rule has two terms. One 
of them is a Hebbian term of the inputs and the outputs of matrix P. The other term is 
proportional to matrix 

(15) P-^ = NQ 

that is to the rest of the loop. Matrix inversion, however, is not necessary, the second 
term can be approximated, e.g., by noise generated at the BU error layer and targeting the 
reconstructed input layer. The reconstruction architecture warrants this property through 
Eq. ll5l '76|l8nj. In turn, the order of learning is as follows: Novel structure not encoded into 
the TD matrix but embedded in mismatch vector e is blocked by SCS thresholding but 
undergoes fast ICA analysis and develops high amplitude BU error components, which 
can not be fully eliminated by thresholding. The access BU error undergoes temporal 
integration at the level of the internal representation. High activity components of vector 
h and high activity components of mismatch vector e will induce Hebbian learning in 
matrix Q. This learning process is slow and it is followed adiabatically (i.e., very closely) 
by matrix P. Given that learning of matrix P is subject to Hebbian learning between 
outputs of matrix W and reconstruction error produced by matrix Q, learning of matrix 
P is kind of 'supervised' by these matrices: Upon TD matrix has incorporated the novel 
information, matrices W and P become equal, provided that no novel information has 
entered the loop. That is, we have the following scenario: (a) novelty is blocked, (b) 
matrix W is modified, (c) novelty is represented by a few large ICA components (i.e., a 
few components of BU error increases, whereas many components of BU error decreases), 
(d) large components overcome sparsification, (e) TD long-term memory changes slowly, 
(f) this slow change is closely followed by BU sparsification. 

We note that a possible role of matrix N can be the whitening of the output of 
the BU error layer that underwent non-linear sparsification. This process, which advances 
Hebbian learning for matrix M, will be discussed elsewhere |131| . 

3. Results: Straightforward mapping to sensory processing areas 

Mapping - based on the reconstruction network description ~ has been thoroughly 
described elsewhere [TnHSHHO]- The control view complements the basic structure of 
that mapping: beyond the function of connections between neocortical layers, it explains 
the function of connections of the neocortical layers that have not been modelled previously 
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to higher layers 




Figure 10. Neocortical circuitry and the dentate gyrus 

A: Input: layer IV. Layer IV neurons send messages to layer II, layer III 
and layer VI. Layer V and layer VI neurons receive messages from layer 
II and layer III. Layer V neurons provide feedback to layers II and III. 
Neurons of layers V and VI are connected (indirectly shown by the prox- 
imity of these layers) . Neurons of layer II and III are also connected (only 
inhibitory connections are shown). Feedforward output to higher cortical 
layers: layers II and III. Feedback to lower layers: layer V. 
B: The excitatory connections between granule cells (g) and hilar mossy 
cells (mc) as well as in the g-CA3 pyramidal cells (p)-mc-g loop provide de- 
lay lines. Feedback inhibitory neurons innervate specific dendritic segments 
of the granule cells at the termination zone of the EC and mc afferents. 
HIPP: hilar interneuron with axonal termination in the perforant path 
zone; HICAP: hilar interneuron with axon termination in the commissural 
and association paths, mf: mossy fibers, r: synaptic delay. The activity of 
these interneurons also controls plastic changes (i. e., training) of the mc-gc 
synapses. 

3.1. Mapping to neocortical regions. The neocortex is made of six sub-layers 
(Fig. llflf A)). The figure depicts the most prominent connections between these sub-layers 
[83j . Input typically arrives at layer IV. Layer IV neurons send messages to layer II and 
layer III (not shown). Furthermore, layer IV neurons send messages also to layer VI. Su- 
perficial neurons provide output down to layer V and VI. There are connections between 
neurons of layer V and layer VI. Neurons of layer II and III are also strongly connected. 
Layer V provide feedback to layers II and III. The main output to higher cortical layers 
emerges from layers II and III. The main feedback to lower layers is provided by layer V. 
(For a review see, e.g., [T^.l 

The theoretical model and the anatomical structure can be matched only by as- 
suming that reconstruction networks are laid between neocortical layers as it was denoted 
by the dark gray areas in Fig. [T^A). On the other hand, robust control is executed by the 
cortical layers, the light gray boxes of Fig.[7KA). According to this figure, superficial layers 
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of the lower cortical layer and deep layers of the higher cortical layer form one functional 
unit, the reconstruction network. The reconstruction error and the reconstructed input of 
this functional unit exert control action on lower reconstruction networks. This controller 
is inputted by the experienced state and by the BU processed reconstruction error (i.e., 
the experienced momentum) of the lower reconstruction network. 

In this view, the VI performs two functions: layers V and VI of the primary visual 
cortex hold the internal representation of the LGN, while layers II and III represent the 
input and the reconstructed input of V2, respectively. 

3.2. Mapping to the hippocampal-entorhinal loop. The hippocampus (HC) is 
placed on the top of the hierarchy (Fig. I^C)). The HC incorporates a unique subunit, 
the dentate gyrus, which, in our view, removes temporal convolution of lower reconstruc- 
tion networks. Loops required for temporal deconvolution do exist in the dentate gyrus 
(Fig. llUf B)). Notably, extreme long delays (on the order of 500 ms), which is a crucial 
prediction of our model [75] has been found experimentally j44| . 

Besides temporal deconvolution, HC is also engaged in information maximization 
and performs whitening and separation (Fig. |^ performed by the CA3 and CAl subfields 
of the hippocampus ^. According to our model, the hippocampus plays two roles: 

(1) HC acts upon the deep layers of the EC. In turn it can exert control action on the 
model layer of the EC. 

(2) HC and EC together, form a reconstruction network, the EC-HC loop. This loop 
is special in that it may perform blind source deconvolution on its inputs. 

The two-phase operation of the loop ensures correct order of learning: analyzing and 
maximizing BU information transfer and the encoding of top-down memory. Details about 
the mapping to the EC-HC loop as well as details about two-phase encoding can be found 
elsewhere 176! . According to that model, encoding of long term memory is initiated by the 
recurrent collaterals (not shown in the figure) of the CAS area in one of the two-phases, 
the sharp-wave phase. 

4. Discussion 

4.1. Relation to other models. From the computational point of view, we should men- 
tion the model of Cluck and Myers 29 , which was designed to perform reconstruction 
and classification together for modeling some properties of the hippocampus. This recon- 
struction idea gains also importance in a recent model of Stainvas et al. I122j. in which 
reconstruction is used as regularization constraint in classification task for creating better 
representations . 

Another approach has been proposed by Rao and Ballard. They have put forth 
an integrating model jl041 11031 1105j by exploring a Kalman-filter analogy to cope with 
the input and system uncertainties (treated as noise) and presented a hierarchy for er- 
ror correction and prediction using top-down inference from higher levels. For spatial 
learning tasks, another Kalman-filter based model has been proposed with a biological 
mapping to the hippocampus ,9 . Kalman-filter is a kind of reconstruction or generative 



'No layer equivalent to the IV layer of the neocortex is present in this loop. 
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network, which uses an internal representation to generate expected inputs. Although the 
Kalman-filter idea could be an efficient and plausible function for sensory processing, the 
mapping of the proposed function onto the anatomical, neurophysiological findings has not 
been elaborated. Another problem is that noise filtering in Kalman-filters requires matrix 
inversions. Also Kalman-filters form loops, which are generally slow for the processing 
speeds found in neocortical areas. Apparently, sensory processing in neocortical areas is 
feedforward p^ . 

4.2. Algorithmic components of the CHF model. The CHF has two basic compo- 
nents: (1) the reconstruction network and (2) a robust control architecture. 

(1) According to Horn |48j . vision is inverse graphics. Our model generalizes this view 
by the old standing proposal that the hippocampus and its environment serve 
as a 'comparator' [HH I12UI I139j . Auto-associators with hidden layers 13^1, i.e., 
reconstruction networks come to the sight by considering these two suggestions 
together. 

(2) The control aspect of sensory processing has been suggested long time ago: Ac- 
cording to Diamond |^ layer V could be seen as an extension of the motor cortex 
in all areas, because layer V neurons send inputs to basal ganglia, brain stem and 
sometimes even to the spinal cord. This conjecture is reinforced by our mapping, 
where layers V exert control actions on lower neocortical areas. 

The CHF model makes use of maximization of information transfer in sensory pro- 
cessing, first suggested by Attneave ^ and Barlow (6j. We note that ICA produces optimal 
representation for mean-field approximation that empowers local (i.e., fast) approximation 
for inferencing (information extraction from uncertain observations) . 

One of the main algorithmic components of the CHF model, the so called sparse 
code shrinkage (SCS) performs database optimized denoising EDI- The intriguing 
point is that denoising is experience based and, in turn, novelty - at first sight - may 
appear as noise. The high 'noise content' blocked by sparsification is a direct sign of the 
possibility of novel information. Novelty detection that precedes recognition (the searching 
of the database) has been an old mystery of information processing in the brain. The SCS 
algorithm placed into the reconstruction network offers a solution here (see later). 

Speed-field tracking (SFT) based controlling is the other main component of our 
model. Efforts have been made to derive the entorhinal-hippocampal loop |75j starting 
from SFT. It can be motivated by 

(1) the opinion that the medial paralimbic system (that includes the supplementary 
motor area as well as the anterior cingulate cortex, and that develops the elabo- 
rated basal ganglia thalamocortical loops) originates from the hippocampal cortex 

and the strength of speed-field tracking in modelling those basal-ganglia 
thalamocortical loops [T^l EZl , 

(2) the control aspect of layer V of neocortical areas 

(3) the view that the brain formulates signals that specify positions and directions 
of targets in extrapersonal space (see, e.g., |146j l that resembles speed-field with 
dynamic path planning capacity ^Hl ll3Uj , 
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(4) the mathematical tractabihty of speed-field tracking based control architecture 
[T26lil29, .128 . 

4.3. Arguments for the CHF model. 

4.3.1. Experimental evidences. As it has been noted in the introduction, two predictions 
of the model, (a) large and tunable temporal delaying capabilities of neurons of the dentate 
gyrus and (b) persistent activities at deep layers of the EC have been reinforced recently in 
j44j and in respectively. This latter prediction follows from the temporal integration 
at the model layer, which are at the deep layers of the EC. In turn, the CHF model has 
already passed two falsifying predictions. 

4.3.2. Implicit memory effects, order of learning. 

Order of learning: Learning steps warranted by the model follows: 

(1) The possibility of novelty is signaled by non-sparse BU processed error activity 
pattern. 

(2) BU processed non-sparse activities, such as noise or novel information are gated 
by sparsification. 

(3) 'Noise' constantly undergoes fast maximization of information transfer in the BU 
processed error channel via ICA. Denoising (sparsification) cannot withhold high 
amplitude components of the BU processed error and information (i.e., structure 
embedded in noise) become available to the hidden layer of the model. 

(4) Structured information is encoded into the top-down matrix, the long-term mem- 
ory of our model. 

In turn, the architecture learns structure and rejects noise. 

Some implicit memory effects emerge directly from the CHF model. 

Repetition priming: The term 'priming' in a broader sense refers to the observation that 
an earlier encounter with a given stimulus can modify ('primes') the responding to the 
same or a related stimulus (see, e.g., jlU7| and references therein). 

It has been shown by numerical simulations of reconstruction networks using ICA 
for BU information maximization that repeated presentations of not yet learned (novel or 
partially novel) inputs shortens relaxation time of the reconstruction architecture without 
modifying LTM. The experienced decrease of relaxation time has been interpreted as 
priming jS^]- It can also be demonstrated that this effect is enhanced by SCS denoising . 

Repetition suppression and repetition enhancement: The neuronal correlate of priming is 
thought to be repetition suppression [2Tj. This belief is supported by the joint appear- 
ance of the two phenomena in many experiments, see, e.g., [lOJ [87|: Both cognitive and 
neurophysiological experiments show that neurons in the neocortex respond with less and 
less activity in the case of repeated stimuli (see e.g., |113j and references therein). This 
repetition decrement is often called 'repetition suppression' in the primate literature |142j . 
Numerical experiments demonstrate that repetition suppression appears jointly with the 
shortening of relaxation time [S5] in extended reconstruction networks. An intriguing 
phenomenon is that during repetition suppression, a few neurons do exhibit repetition 
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enhancement [21]; an emergent property of our model: Repetition enhancement is shown 
by those few units, which upon information maximization become strongly activated and 
can break through the threshold of sparsification [H2j . 

Distributed nature of implicit memory effects: It is known that implicit memory effects, 
such as the recognition of novelty, is distributed |14Uj . The order of learning - as described 
at the beginning of this subsection - warrants that these effects, including novelty detec- 
tion, may occur at every reconstruction network and, in turn, it is distributed in the CHF 
model. Similarly, the comparator function is distributed in the model, too. 

Prototype learning: Recent research has provided evidence that category learning is medi- 
ated by multiple neuronal systems in the brain (see, e.g., and references therein, but see 
j92j). In contrast, when information is accumulated from many exemplars and no verbal 
rules are easily available, implicit mechanisms related to the basal ganglia may operate. 
A good example for this latter type is a classic prototype learning paradigm |1U11I621 161j . 
Interestingly, prototype learning, which is spared in patients with HC damage |62j is im- 
paired in Alzheimer patients |591 158j . In reconstruction networks, many exemplar based 
prototype learning may emerge in different ways. We have also demonstrated that the 
adaptation of the recurrent excitatory connections of the hidden layer |^ EOj can explain 
the impairment found in Alzheimer patients. Another candidate structure is the recurrent 
excitatory connections of superficial layers. These recurrent connections have not been 
included into the CHF model yet. Their function will be conjectured later. 

4.3.3. Temporal compression. The recurrent excitatory connections of the deep layers, 
which perform spatio-temporal pattern completion, and the error-correcting associative 
connections of the superficial layers, together, fulfill the requirements of temporal compres- 
sion if network operation is not continuous but periodic |138j . Such temporal compression 
has been observed in the hippocampus j94 | I119j . 

4.3.4. Specific properties vs. generalization. McClelland et al. 1S6 .I emphasize the necessity 
of a dual system for the seemingly contradictory tasks of learning of specific properties 
and allowing for generalization. The CHF model allows for an attractive solution here: 
Specific properties can be encoded into the LTM, whereas generalization is allowed by the 
fiexible combination of LTM components at different levels of the hierarchy by the control 
means. TD control, up to the limits imposed by sign-prop erness, can distort, combine, 
and excite memory components. 

4.3.5. Mathematical issues. 

Adaptation and learning rules: As it has been described elsewhere j76 | l8Uj . the loop struc- 
ture is advantageous for Hebbian learning. Neural activities and noise, together, with a 
relatively long (50 ms) temporal window are required for whitening and the learning of 
independent components in both BU pathways. Top-down matrices are trained by the 
reconstruction error and the hidden layer activity (the delta rule); the necessary signals 
for Hebbian learning are available between the model layer and the reconstruction error 
layer [TBI En] . 
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The controller can learn by experimenting, i.e., by the learning-by-doing scheme 
|26j . No matter what the desired parameters are, the experienced parameters need to be 
associated to the control parameters to form an approximate inverse dynamics. When the 
approximation becomes sign-proper, the control architecture will be approximately precise 



Adaptation of the robust controller is straightforward and occurs via error feedback 
and temporal integration. It is an attractive property of the control architecture that fast 
temporal changes of different transformations of different networks of the hierarchy (i.e., 
learning) can not disturb the stability of the controller as long as the control remains 
sign-proper (Appendix 15.1(1 

Noise in the controller: It has been noted in jl26j that the SDS scheme is sensitive to 
noise if the noise enters the system just before the compensatory vector is integrated, i.e., 
if noise affects w = ujfe of Eq. El Such a noise can easily make the system unstable, 
because the perturbation takes the form 



where n{t) denotes the noise. Unfortunately, the boundedness of integral of Eq.|Hl cannot 
be ensured for the general case. Moreover, the amplitude of the perturbation will be 
proportional to A. This means that increasing A will also increase the perturbation of the 
system. This problem is the problem of every dynamic state feedback controller provided 
that noise can enter precisely before the point where the compensatory control signal is 
integrated through time. It is an emerging feature of the CHF architecture that this 
problematic noise component can be diminished by optimized SCS denoising. 

On the other hand, the presence of noise is necessary and advantageous in the CHF 
model. It is necessary for Hebbian learning TB^ and it is also advantageous to improve 
generalizing capability of ICA j53j . 

Reinforcement learning and behavioral relevance: It is a central issue if the robust con- 
troller can be incorporated into the reinforcement learning (RL) framework or not. This 
problem has been treated theoretically and answered positively: The integration of robust 
controller and RL is possible within the so called event learning framework, a novel form 
of reinforcement learning |781 11341 11351 179j . In this scheme, time is broken into discrete 
time intervals and, if temporal resolution is sufficient then, near optimal performance with 
uncertain state descriptions can be achieved. Some parameters that might change consid- 
erably (e.g., the mass |128j or the length of a robotic arm [2ZI) may be left unobserved 
without affecting near optimality, a rare feature in RL models. Furthermore, the robust 
controller fits smoothly the reinforcement learning interpretation of dopamine responses 
found in the basal ganglia |1151174j . Considering behavioral relevance, there is no decision- 
making system embedded into the CHF model; the CHF model is passive. On the other 
hand, we have succeeded to show that the CHF model, including the reconstruction loop 
architecture and the robust controller can be embedded into the reinforcement learning 
framework and near optimal performance will be warranted. In turn, the model can be 
easily extended by decision-making and planning. 
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In the CHF network, the hidden variables are controhed. It is then another issue 
if a model with hidden variables can be optimized using RL. We note that Kalman- filters, 
which are tractable from the point of view of mathematical considerations, can be seen as 
the mathematical approximation of the CHF scheme. Kalman-filters have been integrated 
into the reinforcement learning framework and convergence to the optimal solution is 
warranted 113211133] . To our best knowledge, this is the first case when a partially observed 
Markovian decision problem is shown to converge and to learn the optimal policy. One 
emerging property of the derivation is that the learning rule for the value of states has a 
Hebbian form |132j . The weight factor of the learning rule is proportional to the error of 
value prediction. Based on this observation we may remark the followings: 

Note (1) The CHF model, alike to its previous versions [7S1 [TBI ISOl is neither a 
model for episodic learning, nor a model for incremental learning and does not fit such 
traditional distinctions (see, e.g., [28] ) . On the one hand, when information maximization 
is not modulated by behavioral relevance, the CHF model is an incremental model. On 
the other hand, the CHF model can be instantaneous, if behavioral relevance (i.e., error 
in value estimation) increases learning efficiency. Error in value estimation can modulate 
the Hebbian learning rule of LTM and, in turn, input can be immediately encoded into 
the reconstruction architecture. 

Note (2) Long-term memory of the CHF model are not permanent; they may 
change. Changes are subject to statistical properties of the information (because informa- 
tion transfer is to be maximized) and to behavioral relevance of the information. That is, 
in the CHF model, 'memory traces are unbound' j91j . 

4.4. Conjectures. The CHF model allows us to make predictions. Some of the con- 
jectures qualify as attractive possibilities offered by the CHF model, whereas others are 
falsifying predictions. 

Principles of learning bottom-up and top-down transformations may apply for the 
learning of the predictive matrix (i.e., matrix N) of the hidden layer, too. It is easy to 
show that minimization of the square of the prediction error may lead to Hebbian learning 
for this matrix and has the following form: 

(16) AM oc ^Nh) h'^ 

provided that three conditions are met: (i) connections (synapses) have access to the 
corresponding hidden layer activities, i.e., to h. (ii) The same connections have access 
to the error of the hidden layer activities on the other side, i.e., to Nh and (iii) input 
to the hidden layer is whitened to avoid the necessity of multiplication by the inverse of 
the correlation matrix C(h, h). From condition (iii) it is conjectured that transformation 
N, which corrects the activity of the hidden layer, whitens the sparsified BU error. This 
transformation has remained unconstrained: The prescribed identity transformation of 
the full loop can be achieved by rescaling matrices, e.g., the top-down matrix. Whitening 
is also advantageous from the point of view of learning: The learning rule of Eg. 1 161 follows 
the natural gradient |2]. Condition (ii) implies that one of the deep layers holds this 
whitened BU error, whereas the other holds the hidden activities. Local circuits between 
the two layers can provide the appropriate Hebbian training signals. Given that neurons 
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of layer V play control roles, it is layer VI, which can hold the whitened BU error. If this 
is the case, indeed, then sustained activity of neurons of layer V may be more pronounced 
than that of neurons of layer VI. 

The role of associative structures in the superficial layers is not apparent in the 
CHF model. It is conjectured here that these associative connections may serve error 
correction. We need to tell error correction and pattern completion apart. Here, pattern 
completion concern spatio-temporal patterns. Error correction, on the other hand, is seen 
as a synchronous operation over actual inputs, alike in content addressed memories [SHI 
or Hopfield network- like constructs jl43j . The dynamic Hopfield model (56) is a candidate 
CNS model for layers with recurrent collaterals. Note that in the CHF model [THj the 
recurrent collateral system of the CAS subfield elicits randomly ordered temporal replays 
of input sequences and serves to encode information into the top-down memory and to 
diminish weak connections to improve generalization capabilities j53j. This replaying role 
is supported by experimental evidences jl44l 11181 1^ HT] . 

Feedback connections between neocortical layers are generally more numerous than 
the feedforward connections between neocortical layers and these connections seem to have 
a weak functional role (see, e.g., J2 and references therein). Also, interaction within 
neocortical layers is much stronger than feedback activities. On the other hand, these 
feedback connections play a central role in the CHF model; these connections form the long 
term memory of the architecture. In our view, this is an apparent contradiction. The trick 
is that reconstruction is fast and easy and, in the CHF model, it is feedforward in a well 
tuned network. The hard problems are (i) how to choose amongst the different possibilities, 
i.e., the 'negotiation' between different neocortical columns, different neocortical areas and 
different sensory modalities. The possibility that different interpretations may coexist in 
the brain has been made evident in the animal experiments on binocular rivalry (see, e.g., 
|68j and references therein) and in experiments with several possible visual interpretations 
|67j . (ii) How to express the decisions that have been made? It is expected that on-going 
control signals - no matter if conscious or not - are separated from background noise by 
synchronous operation. Such synchronous signals dominate the reconstruction process. 

Falsifying prediction 1. An intriguing conjecture of the CHF model is that dis- 
turbance of the feedback connections between neocortical layers may corrupt apparent 
feedforward processing and recognition but should not corrupt prototype learning. 

Falsifying prediction 2. The CHF model allows us to claim that perceptual learn- 
ing (see, e.g., |114j and references therein) and categorical perception (see, e.g., HO] 
and references therein) are two manifestations of the same long-term memory effects. If 
so then (i) the place of encoding should include almost exclusively the top-down LTM 
components of areas engaged in sensory processing and (ii) frequent appearance without 
strong behavioral relevance will barely modify the same components. Nevertheless, mod- 
ified inputs to higher layers may influence bottom-up processing in areas above the place 
of encoding. 
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5. Appendices 

5.1. Appendix A: Robustness and stable on-line adaptation. A plant is called 
first order if its 'position' (or configuration) determines the state of the plant and if the 
dynamical equation determines the momentum (of all parts of the plant). A plant is 
called second order if the state is given by the configuration and the momentum and if 
the dynamical equation determines the acceleration, and so on. Higher order plants can 
be rewritten into the form of a set of first order differential equations by concatenating 
'position', 'momentum', etc. into the state vector (see also Appendix l5.2j) . 

Speed field tracking (SFT) is not typical in the control literature, but arises natu- 
rally if we consider stationary optimal-control problems such as path planning tasks |49j . 
Conventional control tasks, such as point-to-point control and trajectory tracking cannot 
be exactly rewritten in the form of SFT and vice versa jl261 1127| . SFT prescribes the 
speed vector x of the plant as a function of the state vector: 

(17) X = v(x) 

SFT task has the advantage that the designer can incorporate several objectives into the 
form of the speed-field to be tracked hence extend the model's range of possibilities. 

The mathematical treatment described here is a slight generalization of that of 
published in |128| I126j . The control scheme works for plants of any order, alike to the 
original proof. The identity of the feedforward and the feedback controllers is released. 
This slight generalization seems necessary to properly describe the superficial-to-deep 
layer wiring of the neocortex. 

Let D C R" denote the domain of the plant's state with the equation of motion 
given by 

(18) u = B(x)x + b(x) 

where x is the state vector of the plant and u G R™ is the control. For simplicity the 
dependence of B and b on x will not be explicitly represented. Now let us assume that 
we have two estimates of the true inverse-dynamics function <I>(x, x) = Bx -|- b, given by 
^(x, x) and *(x,x): 



(19) 
(20) 



*(x,x) = Bx + b 
^(x, x) = Ax -\- a 
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The SDS Feedback Control equations can then be written as 

(21) u = u//(x,x,v(x)) + w, 

(22) w = A (^l>(x,v(x)) - $(x,x)) 

where u/y is the so cahed feedforward controller (to be specified later), A > is the gain 
of feedback. 

In what follows the usual definition of positivity for fields of square matrices will 
be required. 

Definition. Let M : D ^ R^^^, p > 0. M zs said to be positive definite uniformly 
over D iff' for all x € D the term M(x) is positive definite and there exists an e > such 
that Ainin(M(x)) > e holds for all x E D. Uniform negative definiteness can be similarly 
defined. 

If M is a real quadratic matrix then let M > denote that M is positive definite. 
Similarly, if M is a matrix field over D, let M > denote that M is uniformly positive 
definite over D . 

Theorem. Assume that the feedforward controller has the form 

u//(x,x,v) = *(x,v) - *(x,x), 

which is similar as the input of the feedback integrator. Further, assume that the followings 
hold: 

(1) *(x,x) = Bx + b 

(2) *(x,x) = Ax + a 

(3) X'^Y, where X and Y G {A, A, B}''' are uniformly positive definite over D 

(4) A, V, b are bounded and have uniformly bounded derivatives w.r.t. x over D 

Then for all A > the error of tracking v(x), e = v(x) — x, is eventually uniformly 
bounded and, further, the eventual bound b of the tracking- error can be made arbitrarily 
small. More specifically 6 = 0(1/ A), and the eventual bound for the time reaching ||e|| < b 
is proportional to A. 

The proof of this theorem relies on a Liapunov-function approach. First of all, note 
that w = AB(v — x) and that u = A(v — x)+w. The relation (B + B)e = Av + b — w can 

be employed to show that L = ^e^ (A + A)-^(A + A) e is an appropriate semi-Liapunov 



By definition, A = B 
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function. The proof starts by differentiating e according to time: 

d , , 5(Av + b) . 

(23) — Av + b-w = ^x-w 

at ax 

(24) = 5(^^x-AB(v-x) 

(25) = f - ABe 

where denotes the Jacobi derivative matrix of vector x according to vector x and 

f = ^^^^t^x. In turn, the temporal derivative of our semi-Liapunov function assumes 
the following form: 

(26) — L = e'^(A + A)^ ff - ABe 

dt L 

Then the proof can be completed by using the methods thoroughly detailed in |126j . The 
proof compares the two terms of the r.h.s. of Eq 12^1 and notes that the negative term, 
which can be made arbitrarily large, ensures uniform boundedness. 

The subtle point of this theorem is the requirement that A-^A and A-^B should 
be uniformly positive definite over D. The theorem gives rise to a global stability result. 
Notice too that the particular form of the feedforward and feedback controllers make it 
unnecessary to build an estimate of b. 

Note that in the case of Eq.l^Hlthere is no dependence on the approximated inverse- 
dynamics. This fact can be exploited to show that the above proof remains valid if A, 
B, a and b vary in time but the conditions of the theorem remain valid at every instant. 
Thus we get the following important corollary: 

Corollary. Suppose that the conditions of Theorem hold and also that A = A{t), 
B = B{t), a = a(t) and b = b(i). Next assume that B^B and X^{t)Y{t), where X(t) 
and Y(t) € {A(t), A(t), B(t)} are uniformly positive-definite over D and for all t > 0, 
and that A(t) and B(t) are bounded. Then the conclusions of the above theorem still hold. 

The uniform positive-definiteness conditions of the corollary follow, e.g. when B 
is bounded away from singularities uniformly over D: an assumption often required in 
adaptive control |112j . It is clear, too that the stability result does not depend on the 
specific adaptation mechanism utilized, which is a fairly rare condition in adaptive control 
theory. It also follows that gain A can be adapted during controlling. 

However, one has to provide an additional proof to show that the conditions re- 
quired for A and B are obeyed. If those conditions are not obeyed, then exponential 
deviation may occur. In the case of exponential deviation, control should be stopped be- 
fore crash and and the 'learning-by-doing' procedure (Section Q can be invoked for the 
history recently experienced. 

Change of notations. One may make use of the following condensed notations: x^^es = 
X(ies(x), Xexp = Xexp{x), where 'des' and 'exp' refer to desired and experienced quantitites. 
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which are dependent on the state x. This notation can be augmented by the following 
'subtraction rule': 



(27) 



v(x) — X 



X 



The notation and the subtraction rule allows one to depict the robust control scheme of 
higher order plants (with no feedforward controller) in the graphical form of Fig. A) 

5.2. Appendix B: Transcription of higher order differential equations into first 
order differential equation. Let us assume that the dynamical equation of the plant 
has the following form: 

^ ^ / aq d^q a"-^q 



at" V 5t ' ' ■ ■ ■ ' ap-i 

and introduce the notations 

(29) X = (zi, . . . ,z„_i) 

(30) Zl = q 

(31) Zfc = ^ (fc = 2,...,n-l) 
then Eq. |2H1 can be rewritten as 

(32) (x=)^ = g(x) 

5.3. Appendix C: Relaxed deconvolving needs of signals temporally convolved 
and mixed by reconstruction networks. It is easy to show by insertion that the 
dynamical equation 

(33) h{t) = W(x - Qh) 

of the reconstruction network of Fig. E^B) gives rise to the following solution 

(34) h(t) = / exp (-WQ(t - t')) Wx(t')(^t' 

J — oo 

The condition of convergence is that WQ is positive definite. 

Solution (|34|) is forms a temporal convolution, which needs to be removed for proper 
maximization of information transfer. Blind source deconvolution (BSD), in general, is 
demanding in terms of the number of neurons and connectivity structure jHE]- However, 
the convolution of Eq. El can be simplified by mixing. Diagonalizing matrix WQ as 
WQ = VAdiag'U^ with UU^ = U^U = I, mixing h by U, denoting the mixed quantity 
by x(= Uh) and introducing notations ^ = UWx, one has 

(35) x{t)= [ exp{-Adiag{t-t'))at')dt' 

J — oo 
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For BSD, Eq. (|35j) requires only diagonal delay lines and BSD is to be followed by a 
separate ICA transformation, a much relaxed set of conditions. For some simulations, and 
for arguments on convergence-divergence patterns in the dentate gyrus, see |76) . 
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